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Abstract 

In this paper we examine a number of models that generate random 
fractals. The models are studied using the tools of computational com- 
plexity theory from the perspective of parallel computation. Diffusion 
limited aggregation and several widely used algorithms for equilibrat- 
ing the Ising model are shown to be highly sequential; it is unlikely they 
can be simulated efficiently in parallel. This is in contrast to Mandel- 
brot percolation that can be simulated in constant parallel time. Our 
research helps shed light on the intrinsic complexity of these models 
relative to each other and to different growth processes that have been 
recently studied using complexity theory. In addition, the results may 
serve as a guide to simulation physics. 

Keywords: Cluster algorithms, computational complexity, diffu- 
sion limited aggregation, Ising model, Metropolis algorithm, P-completeness 



1 Introduction 



Random fractals are a major focus of investigation in statistical physics. 
Such patterns occur at equilibrium critical points and arise through a va- 
riety of non-equilibrium dynamical processes. A number of models gener- 
ate random fractals including diffusion limited aggregation (DLA) and the 
Ising model at criticality. These models have been extensively studied by 
computer simulation methods and, in some sense, they are defined by the 
algorithms that are used to simulate them. In this paper we examine such 
defining algorithms from the viewpoint of the theory of computational com- 
plexity. 

Computational complexity is the branch of theoretical computer science 
that seeks to quantify the resources required to solve problems. One of the 
main achievements of complexity theory is the identification of a hierarchy 
of complexity classes. The classes differ with respect to how the various 
resources, such as time, space and processors, scale in proportion to problem 
size. For example, does the running time increase as a logarithmic, power 
or exponential function of the problem size? Our emphasis is on parallel 
computational complexity. We seek to answer the following question: how 
do the number of processors and the amount of time required to simulate a 
system on a massively parallel computer increase with the system size? 

The motivation for this work is two-fold. First, computational complex- 
ity may serve as a guide to simulation physics. With the growing availability 
of massively parallel computers, it is important to investigate models from 
the perspective of parallel complexity. Another, perhaps more significant, 
motivation is to provide an alternative characterization of these models. An 
enormous amount of effort has gone into characterizing the morphology of 
fractal patterns via critical exponents, fractal and multifractal dimensions, 
scaling functions, and so on. Such characterizations fail to adequately distin- 
guish these models from the standpoint of what can be described intuitively 
as complexity. We believe that the intuitive notion of physical complex- 
ity is at least partially captured by the computational complexity measure 
of parallel time (with the number of processors appropriately restricted). 
This idea, in a slightly different form, has been previously proposed by Ben- 
nett g]]. 

In a nutshell, the idea is that simple objects can be generated quickly 
while complex objects require a long history for their formation. We il- 
lustrate this by comparing two random fractals. The first is Mandelbrot 
percolation ||]; an example of which is depicted in Fig. [l|. We show Man- 
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delbrot patterns require only (non-uniform) constant parallel time to gen- 
erate. Though they are fractals, there is very little interesting morphology; 
the structure on each length scale is independent of the structure on other 
length scales. Many properties of Mandelbrot percolation are susceptible to 
rigorous analysis Q. The second example is DLA |Q] that generates frac- 
tal patterns like those shown in Fig. DLA patterns are produced by a 
highly sequential algorithm that seems to require polynomial (in the size 
of the aggregate) parallel time. DLA patterns reflect a subtle interplay of 
randomness and structure on many length scales. DLA has remained largely 
refractory to theoretical analysis. Whether or not one accepts a definition of 
physical complexity in terms of computational complexity, it is interesting 
that a variety of models in statistical physics can be sharply separated from 
one another by a fundamental new yardstick. 

Our research extends a study of the complexity of a number of growth 
models. Ref. ^ is concerned with a fluid invasion model that generates 
clusters with the same statistics as DLA. This model is shown to be inher- 
ently sequential (technically, P-complete) and so it is unlikely that it can 
be efficiently simulated in parallel. Here we show that the original random 
walk dynamics for generating DLA clusters is also inherently sequential. 
In Ref. || we considered a number of other growth models — invasion per- 
colation, Eden growth, ballistic deposition and solid-on-solid growth — and 
showed that all of these models can be efficiently simulated in parallel. The 
fractal patterns associated with them can be generated on a parallel com- 
puter in a time that scales logarithmically in the system size while using 
a reasonable number of processors. Although each of these models is less 
complex than DLA, each is more complex than Mandelbrot percolation. 

Other applications of computational complexity theory to statistical 
physics have focused mainly on the existence of polynomial time sequential 
algorithms. For example, the problems of finding the exact ground states 
of spin glasses M and computing self-avoiding walks in a random environ- 
ments Q have been shown to be computationally intractable (technically, 
NP-complete) . On the other hand, a polynomial time algorithm exists for 
the random field Ising model |J. There is also work that establishes the 
complexity of finding the partition function of the Ising model and related 
spin models on arbitrary lattices either exactly 0] or approximately using 
Monte Carlo methods |Q. 

In Section § we give an introduction to computational complexity theory. 
A reader familiar with this field may want to skim this section. In Section || 
we investigate the computational complexity of the following systems: Man- 
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delbrot percolation, DLA, Metropolis dynamics for the Ising model, Wolff 
dynamics for the Ising model, and Swendsen-Wang dynamics for the Ising 
model. Section [| is devoted to a discussion of the results. 



2 Computational Complexity Background 

In this section we provide an introduction to computational complexity the- 
ory. The reader can find further information and details in a number of 



texts [12, 13] and monographs 14, 15, 16|. 



2.1 The parallel random access machine 

The theoretical model we focus on is the parallel random access machine or 
P-RAM. It is the most commonly used model in parallel computation. We 
describe the P-RAM and then relate its resource usage to the corresponding 
measures for actual parallel computers. 

The P-RAM consists of a number of processors each with local memory 
and having access to a common global random access memory. All processors 
run the same program but are distinguished by non-negative integer labels so 
that the processors may operate on their own data or skip instructions. Input 
to the machine is placed in designated, consecutive global memory locations 
as is output. The P-RAM is in the class of single- instruction multiple-data- 
stream (SIMD) models. The processors run synchronously and in each time 



step a single random access machine (RAM) instruction [17| or a global 
memory access instruction is executed by a subset of the processors. Ex- 
amples of typical instructions are 'write the contents of the accumulator to 
memory location a' and 'add the contents of the accumulator to the contents 
of register a, placing the sum in the accumulator.' 

Although many processors may read the same memory location at a 
particular time, difficulties arise if multiple processors attempt to write to 
the same location. One frequently used arbitration scheme is the concurrent 
write model in which processors are assigned a write priority. When more 
than one processor attempts to write to a given location, the processor 
with the highest priority succeeds. This model is known as the PRIORITY 



CRCW P-RAM [lq ]. We adopt this model and simply refer to it as the 
P-RAM. 

In the P-RAM model any processor can access any global memory loca- 
tion in one time step; the model allows unlimited parallelism. For this reason 
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the P-RAM serves as a convenient model for designing and analyzing paral- 
lel algorithms, for studying processor and time requirements and for proving 
lower bounds. Although the P-RAM is overly simplistic in its assumptions, 
it can nevertheless be simulated on models of parallel computation with 
more restricted connectivity such as the hypercube. These simulations usu- 
ally have a slow down of a logarithmic factor and require roughly the same 
amount of hardware as the corresponding P-RAM computations, see |lf|] for 
additional details and references. 

As an example of the utility of parallelism, consider the task of comput- 
ing the parity of n bits. Parity is the problem of determining whether there 
is an even number of l's in the input. Initially, the n bits are stored in global 
memory locations 1, ... , n. A P-RAM program that computes parity uses 
n/2 processors numbered 1, . . . ,n/2 to add n/2 pairs of bits (modulo 2) in 
parallel. That is, processor 1 adds the contents of locations 1 and 2 storing 
the result (modulo 2) in location 1, processor 2 adds locations 3 and 4 stor- 
ing the result (modulo 2) in location 2, and so on. Similarly, the resulting 
n/2 values are added pairwise (modulo 2) by n/4 processors. This process is 
repeated until, after |~log 2 n\ time steps, parity is computed. Notice, in this 
case the algorithm's output is placed in global memory cell 1 by processor 
number 1. The algorithm runs in O(logn) time using n/2 processors. 

2.2 Complexity classes 

The primary question addressed by computational complexity theory is how 
the difficulty of a computation scales with the size of the problem instance. 
Complexity theory usually focuses on decision problems. An instance of a 
decision problem is a string of bits encoding the problem; the solution is 
simply a 1 or 0. If the solution for input x is 1, we say that x is 'accepted' 
and otherwise x is 'rejected.' In this sense, a decision problem is defined by 
its set of accepted strings. The problem size, n = \x\, is the length of the 
encoded input. A simple example involving the parity problem discussed 
above is as follows: 

Parity 

Given: b\, . . . , b n , where bi G {0, 1}. 

Problem: Do an even number of the bi's have value 1? 

In this case the input is easily encoded using exactly n bits. The output is a 
1 if there are an even number of b^s with value 1 and otherwise. It is easy 
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to see that the answer may be found on a single processor computer (such as 
a RAM or more familiar desktop computer) with a running time that scales 
linearly in n by simply scanning through the bits and maintaining their sum 
modulo 2. 

We now define several important complexity classes for parallel compu- 
tation. 



Definition 2.1 

• The class AC consists of those decision problems that can be solved on 
aP-RAMinO(l) (constant) time using (polynomial) processors. 

• The class NC consists of those decision problems that can be solved on 
a P-RAM in (logn) ^ 1 ) (poly-logarithmic) time using n ^ 1 ' processors. 

• The class P (polynomial time) consists of those decision problems that 
can be solved on a P-RAM in time using processors. 

It is easy to see that AC C NC C P. It is known that AC ± NC and, 
while no proof yet exists, it is widely believed that NC ^ P. The classes in 



Def. 2.1 are robust in the sense that they may be equivalently defined with 
respect to several different computation models |15[; they are not tied to the 
P-RAM model of parallel computation. 

All of the problems considered in this paper are in the class P. It is gen- 
erally accepted that problems in the class P have feasible sequential time 
solutions. The question we pose for the fractal models is whether a polyno- 
mial time problem can be qualitatively sped up via massive parallelism. For 
the parity example, the sequential solution mentioned takes 0(n) time. The 
parallel solution outlined previously shows a P-RAM can solve this problem 
in O (log n) time using n/2 processors. Thus, the parity problem is in the 
class NC and a qualitative speed-up is achieved in the parallel setting. On 
the other hand, parity is not in AC , see [^] for example. 

We will use the terminology that problems in NC (and thus AC ) are 
'efficiently solved in parallel,' since we obtain a qualitative speed-up solving 
these problems in parallel. On the other hand, problems that are in P but 
likely not in NC are called 'inherently sequential.' The running time for 
solving an inherently sequential problem cannot be decreased from polyno- 
mial to poly-logarithmic using a polynomial number of processors. Below 
we identify a class of problems that are in P but not NC (unless it happens 
that NC = P). 
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In order to proceed we need to be able to relate problems to one another. 
This is accomplished via the notion of reduction. The idea is similar to a 
commonly used programming practice. To solve one problem, we often use 
a subroutine call to a different problem. In this sense we reduce our original 
problem to the one involved in the subroutine call. More formally, 

Definition 2.2 Let n = \x\. (Throughout, \x\ denotes the length of string 
x and a decision problem D is represented as a set of accepted strings.) 
Decision problem D\ is NC many-one reducible or NC reducible (-<) 
to decision problem D% if there exists a function f such that x € D\ if and 
only if f(x) € D2, and f can be computed on a P-RAM in (logn) *- 1 -* time 
using n 0< ^> processors. 

If D\ -i Z?2 then D\ is 'no harder' than Z?2- This is because we could solve 
D\ using an algorithm for D2, where the input to D2 is produced by an 
efficient calculation involving /. We can also compare a given problem to 
an entire complexity class, via the concept of 'completeness.' 

Definition 2.3 A decision problem D is P-complete if 

(1) D e P and 

(2) for all D' G P, D' < D. 

The P-complete problems are therefore the hardest problems in P. Based 
on these definitions, the following theorem is straightforward to obtain. 

Theorem 2.4 // any P-complete problem is in NC then NC = P. 

Thus, if the well-known conjecture in computer science that NC 7^ P holds 



(and there is lots of evidence supporting this conjecture |14j]), P-complete 
problems are inherently sequential. 

-< is a transitive relation. That is, if D\ -< D2 and D2 -< -D3, then 
D\ -< D%. Therefore, if some problem D' is shown to be P-complete and 
D' -< D then D must be as difficult to solve as D' . In this case we say D is P- 
hard. If D is also in P, it is P-complete as well. Using transitive reductions, 
a large number of P-complete problems have been identified and no efficient 
parallel solution has been found for any of them, providing evidence for the 
conjecture. In this paper we will prove that several problems from statistical 
physics are P-complete by showing that known P-complete problems reduce 
to them. 
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The fundamental P-complete problem is the circuit value problem (CVP); 
it is phrased in terms of Boolean circuits. Before describing CVP, we give 
an informal description of circuits. A Boolean circuit is a collection of con- 
nected not, and and OR gates, not gates have one input and multiple 
outputs; and and OR gates have multiple inputs and multiple outputs. The 
fan-in (fan-out) is the number of inputs (outputs) of a gate. The connec- 
tion of the gates is 'feedforward.' That is, it must be possible to number 
the gates so that the outputs of a gate are connected to the inputs of gates 
with higher numbers. Such a numbering is called a topological numbering 
and we say the gates are in topological order. In calculating outputs from 
inputs each gate computes its Boolean function just once. Sometimes gates 
other than not, and and OR are considered. The size of a circuit is defined 
as the number of gates. The depth is the longest path from an input to an 
output. 

Circuit value problem (CVP) 

Given: A compactS encoding a of a Boolean circuit together with its inputs 

and a designated output gate g. 
Problem: Does g evaluate to 1 on input x%, . . . , x n l 

Theorem 2.5 The circuit value problem is P-complete t 2Lj. 

Numerous variants of CVP are P-complete 14]. In NOR CVP the circuit 
consists entirely of NOR gates with fan-in and fan-out two. NOR CVP without 
fan-out restrictions is also P-complete for planar circuits; this version is 
called planar NOR CVP. In monotone CVP the circuit is composed of and 
and OR gates (not gates are absent) having fan-in and fan-out two. This 
problem is P-complete for arbitrary circuits but it is in NC for planar 
circuits. We shall make use of planar NOR, monotone and other restricted 
variants of CVP in Section ||. 

A proof that a problem D is P-complete via a reduction from CVP 
is tantamount to what in other contexts has been called 'computational 
universality' The dynamics of hard spheres in classical mechanics [21] and 



some cellular automata rules [14, 22 1 have been shown to be computationally 



universal. Our proofs that DLA and various Ising Monte Carlo dynamics 
are P-complete depend on showing that arbitrary logical calculations can 
be embedded in these dynamics. 



1 A compact encoding of a circuit is polynomial in the circuit size. 
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In addition to the resources of parallel time and number of processors, 
the notion of uniformity plays an important role in computations. Roughly 
speaking, a uniform solution to a problem uses the "same" program for each 
problem size whereas a non-uniform solution may use a different program for 
each size. For example, simulating the Ising critical point in three dimensions 
using conventional Monte Carlo methods is a non-uniform problem because 
the critical temperature is required as a parameter in the algorithm. As the 
system size increases, the program must contain an increasingly accurate 
value of the critical temperature.cl On the other hand, simulating DLA 
clusters is a uniform task since no fine tuning of parameters is required. The 
same can be said for other 'self-organized' critical points such as invasion 
percolation. Recently, a uniform algorithm for sampling Ising critical points 
has been developed [p^j . 



2.3 Parallel time and logical depth 

The P-RAM model and the complexity measures that are built from it are in 
some sense unphysical because unit time is assigned to a single read or write 
step. Eventually, as such a device is scaled up, the communication time 
between processors and memory dominates the running time and the unit 
time assumption fails. Indeed, in the limit of large systems all of the models 
discussed here require polynomial time to simulate on any real world device 
because all are capable of generating random patterns with correlations on 
the scale of the system size. These correlations cannot be set up without 
communication across the system and this requires polynomial time. 

Nonetheless, parallel time correctly identifies an important aspect of 
the problem which can be called 'logical depth.' The logical depth is the 
minimum number of logical operations that must be carried out in sequence 
before a problem is solved. This concept can be made rigorous by considering 



families of Boolean circuits [15]. A family of Boolean circuits, one for each 
problem size, can simulate a P-RAM programmed to solve a given problem 
and vice versa. The definitions of the complexity classes AC , NC and 
P can be stated in terms of families of Boolean circuits: the number of 
processors corresponds roughly to the size of the circuit (number of gates) 
and the parallel time roughly to the depth of the circuit (length of the longest 
path from input to output). Thus for example, a problem is in the class NC 



2 Specifically, the critical temperature must be known to accuracy L where L is 
the system size and v is the correlation length exponent. 
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if it can be solved by a uniform family of Boolean circuits having polynomial 
size and poly- logarithmic depth in the number of inputs (the problem size). 

A few comments are in order regarding the number of processors. If only 
one processor is allowed, then all the problems treated here require polyno- 
mial time. If on the other hand, the number of processors is unrestricted it 
can be shown [24] that all the problems discussed here are solvable in con- 
stant P-RAM time using exponentially many processors (or equivalently by 
circuit families with exponential size and constant depth) and again the in- 
teresting distinctions based upon parallel time disappear. Interesting results 
are found when polynomial parallelism is permitted. 



2.4 Complexity of sampling methods 

Computer scientists study decision problems whereas computational statis- 
tical physicists are usually concerned with sampling problems — generating 
states from some equilibrium or nonequilibrium distribution. Sampling algo- 
rithms require a supply of random numbers and produce as output a system 
configuration. This configuration is described by m bits representing the 
degrees of freedom of the system expressed in binary. One can extend the 
ideas of complexity theory to sampling methods by introducing probabilistic 
P-RAM's in which each processor is equipped with a register for generating 
random bits. 

Instead of producing random bits dynamically one could equivalently 
produce the required random bits in advance and include them as inputs to 
a deterministic calculation. In this way a sampling method is reduced to 
m decision problems, one for each binary degree of freedom. An example 
of such a decision problem is 'Does Ising spin s,- (1 < j < m) have value 
+1 after M iterations of the Monte Carlo procedure using random numbers 
Note that these m decision problems may be run in parallel with, in the 
worst factor of m increase in the number of processors. Therefore, the 

sampling algorithm has the same parallel time requirement up to a constant 
factor as the associated decision problem. 

In statistical physics, the problem size is conventionally identified with 
the system size; the number of bits, m required to specify a system config- 
uration. This differs from complexity theory where it is the number of bits 
required to state the problem that is identified as the problem size. The 
following definition insures that the two notions of problem size are compat- 
ible. For a given sampling method with r random inputs, o ordinary inputs 
and m outputs, we define the associated natural decision problem as follows. 
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The input is of length m + o + r. The first m bits represent the degrees of 
freedom of the system. Of these bits exactly one is a 1. The position of the 1 
specifies which degree of freedom of the system (e.g. which Ising spin) is to 
be evaluated. Since the selected degree of freedom is expressed in unary, the 
decision problem size is at least as great as the system size.u For example, to 
represent the fifth out of ten degrees of freedom our unary expression would 
be '0000100000.' The next o bits are the ordinary inputs to the problem 
expressed in a suitably compact form. These inputs might include the size 
of the lattice, the temperature, the number of iterations of an elementary 
Monte Carlo step and other relevant parameters expressed in binary nota- 
tion. The final r bits are the random bits needed for the sampling method. 
So that the answer or other potentially useful information is not built into 
these bits, we require that they be interpreted as independent random vari- 
ables that take the value 1 with probability 1/2. We restrict our attention 
to 'reasonable' sampling methods where r is bounded by a polynomial in m. 

The decision problem for a sampling method can now be studied using 
conventional computational complexity theory. It must be emphasized that 
the complexity of the decision problem is only an upper bound on the com- 
plexity of sampling a given distribution. The reason is that the decision 
problem is associated with a particular sampling method. It may be that 
an alternative method leads to a less complex decision problem. In principle 
we would like to know how the time, number of processors and number of 
random bits scale with m for the optimal sampling method. Unfortunately, 
tools for studying this question have yet to be developed. Instead, we fo- 
cus on the complexity of several known sampling methods. Nonetheless, if 
the best known sampling methods are investigated and their complexity is 
established, it is plausible that the complexity of sampling has also been 
found. (Note, proving that a particular sampling method is optimal seems 
to be a very difficult task.) 

3 Complexity of Random Fractals 

In this section we consider the following models: Mandelbrot percolation, 
diffusion limited aggregation and the Ising model. We discuss sampling 
methods for these systems and then study the parallel computational com- 
plexity of the associated decision problems. Each model generates random 

3 This helps insure that the problems considered are in P and that the number of 
processors used will be polynomial in the input size. 
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mass fractals — sets of 'occupied' sites whose number scales as a noninteger 
power of the lattice size. 



3.1 Mandelbrot percolation 

This random fractal was first described by Mandelbrot 0. It was analyzed 
by rigorous methods in Q , and was later generalized and applied as a model 



of a fractal porous medium [25, 26, 27 1. Mandelbrot percolation is defined 



on a d-dimensional lattice. It is parameterized by a rational retention factor 
Q (0 < Q < 1), a positive integer rescaling factor iV and iteration number 
k. System configurations are described by a bit at each lattice site. If 
the bit is a 1, we say the site is 'occupied.' For purposes of illustration, 
we consider the two dimensional version on an N k x N k square lattice. A 
configuration is generated in the following way: at the i th step (0 < i < k-1) 
the lattice is completely divided into, N l x N l non-overlapping squares and 
each square is independently 'retained' with probability Q. If a square is 
retained, the site(s) in it are not changed. If a square is not retained then 
all of the site(s) in it are changed to unoccupied. After k steps unoccupied 
regions with a wide range of sizes are typically created. The resulting set of 
occupied sites is a random fractal with limiting Hausdorf dimension, Dh = 
2 + (log Q) /(log N) if Dh > 0. A realization of Mandelbrot percolation with 
N = 2, Q = .9 and k = 7 is shown in Fig. |l|. 

A natural decision problem associated with Mandelbrot percolation takes 
as input random numbers, Xi with < Xi < 1. These numbers are used to 
generate 'retention bits' that are 1 if Xi < Q and otherwise. Each retention 
bits determines whether a particular square of a given size is retained. 



Mandelbrot percolation (dimension d, scale factor N, precision b) 
Given: A non-negative integer k, a designated lattice site s expressed in 
unary with |s| = N dk , a retention factor Q (0 < Q < 1) with Q represented 
by a 6-bit binary numberi and a list of (N dk — 1)/(1 — N~ d ) random numbers 
Xi with < X{ < 1 expressed 6-bit number. 

Problem: Is site s occupied by the Mandelbrot percolation process? 

The instances of Mandelbrot percolation require that the dimension, 
scale factor and precision are all fixed inputs. In terms of the discussion of 

4 Our method of producing random numbers via coin tossing suggests this coding choice. 
Such a scheme does not allow all possible rationals in the interval [0, 1) to be represented. 
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Section |2~4| relating decision ancLsampling problems, \s\ = m, [log 2 k~] +b = o 
and b(N dk - 1)/(1 - N~ d ) = rB 

A constant time P-RAM algorithm for Mandelbrot percolation is sketched 
below. First, retention bits for every square of each size are computed in 
parallel by comparing the Xi to Q. Since 6 is a constant, this can be done 
in constant time. The m retention bits for the individual sites are placed 
in memory cells 1 to m. For each site j (1 < j < m) the occupancy of j 
is determined by taking the AND of all the retention bits of the k squares 
containing j. To compute the and, all processors reading a retention bit 
write a value of into global memory cell j. This step uses km processors. 
Note, cell j is 1 if site j is occupied and otherwise. Next, the and of 
cell j and the j th place in the unary expression of s is computed; the result 
placed in cell j. Now, cell j is 1 if and only if site j is occupied and is the 
selected site. Finally, the OR of cells 1 through m is taken (by having any 
processor reading a 1 write to memory cell 1) to determine if the selected 
site is occupied. 

This P-RAM algorithm uses constant time and polynomial {km) pro- 
cessors. For problems in constant time a very strong notion of uniformity 
(meaning highly uniform), DLOGTIME, is typically imposed. The algo- 
rithm described does not seem to be DLOGTIME uniform. This is because 
for each lattice size, a different program may be needed to quickly incorpo- 
rate information about which retention bits belong to a given site. Note that 
in general proving a problem does not meet a given uniformity condition is 
very difficult. Our algorithm is NC 2 uniform. We have the following: 

Theorem 3.1 Mandelbrot percolation is in non-uniform AC . 



3.2 Diffusion limited aggregation 

Diffusion limited aggregation Q is a cluster growth model where new occu- 
pied sites are added to the growing cluster one at a time. Here we illustrate 
DLA for a two dimensional lattice with growth initiated along a line. A ran- 
dom walker is started at a random position along the top edge of an L x L 
square lattice. The walker moves until it is a nearest neighbor of an existing 
occupied site at which point it joins the cluster. Initially, the bottom edge 
of the lattice is considered occupied. If a walk fails to join the cluster, hits 
the top boundary of the lattice or is unable to move (goes off the lattice or 
encounters a site that is occupied in its first move), it is discarded. A new 

5 This is not precise as delimiters are also used in the encoding to make decoding easier. 
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random walk is started as soon as the previous walk has joined the cluster 
or been discarded; the process continues until a cluster of the desired size is 
grown. 

A natural decision problem associated with the dynamics of diffusion 
limited aggregation is defined below. 

Diffusion limited aggregation (dimension d) 

Given: Three positive integers L, Mi and M2, a designated site s expressed 
in unary with \s\ = L d and a list of random bits specifying M\ walk trajec- 
tories each of length M2 defined by a starting point on the top edge of the 
lattice together with a list of directions of motion (e.g. N, S, E and W for 
two dimensions). 

Problem: Is site s occupied by the aggregation process? 

The proof that DLA is P-complete proceeds by a reduction from a variant 
of the planar NOR circuit value problem. The reduction has a similar flavor 
to the proof that a closely related fluid invasion problem is P-complete ||, 
although there seems to be no way to make use of that proof directly. 

Theorem 3.2 Diffusion limited aggregation is P-complete. 

Proof sketch: The idea is to prescribe a sequence of walks capable of 
carrying out the evaluation of a modified (but still P-complete) version of 
the planar NOR circuit value problem. In this version of CVP the NOR gates 
have a fan-in and fan-out of two. We also allow single input OR gates with 
fan-out restricted to at most two. The circuit encoding requires that the 
gates are numbered in topological order. The encoding specifies a planar 
layout of the circuit with gates being located at grid points. Finally, the 
circuit is required to be synchronous. That is, each gate receives its inputs 
only from gates on the immediately preceding level. Gates at level one are 
the only gates that are allowed to have direct circuit inputs. It can be shown 
that this version of CVP is P-complete. 

The walks to simulate the circuit are chosen so that the cluster grows 
along linear paths of sites and bonds that play the role of wires connecting 
gates. A wire carries the value true if the cluster grows along it. Wires 
that remain unoccupied carry the value false. The gates themselves are 
represented by locations where wires meet and several parts of the growing 
cluster interact. Below we describe how logical values are propagated along 
wires and how NOR and OR gates are implemented. 
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Logical values are propagated as follows. Each wire is realized by a 
preassigned sequence of walks. These walks move to successive locations 
along the wire. Each walk moves to its assigned site along the wire and, 
if the value of the wire is true, it sticks there. Each walk reverses its 
path after reaching its assigned site. In this way if the wire carries the 
value false, the walk returns to the upper boundary and is discarded. For 
example, the first walk creating the output wire for the gate shown in Fig. || 
arrives at site 'd' from above. If 'c' is occupied this walk sticks at 'd' and the 
cluster begins to grow along the output wire. If 'c' is not occupied, the walk 
turns around and retraces its steps back to the upper boundary where it is 
discarded. Thus the cluster grows along the output if site 'c' is occupied. 
It is straightforward to have the output wire split into two separate wires. 
These then become the inputs to other gates. Note, a larger fan-out could 
be supported, however, the details of the proof become more involved. 

A NOR gadget is shown in Fig. || The solid lines and circles represent 
the input wires, the output wire and the 'power' wire. The role of the power 
wire is to provide a growing tip for the output if needed. The dashed line 
represents a walk that will stick at one of the three open circles labeled 
'a,' 'b' or 'a' The dashed walk evaluates the gate and so must not occur 
until the input and power wires have been grown to completion. Suppose 
that input 1 is true so that the corresponding segment of wire is occupied. 
Then the trajectory sticks at 'a.' If input 1 is false but input 2 is true, 
the dashed walk sticks at 'b.' Finally, if both inputs are FALSE then the walk 
sticks at 'a' The occupancy of site 'c' records the output of the gate. 

Fig. |H shows the implementation of a single input OR gate. Effectively, 
it shows how to cross the power wire (running toward the left) over a logical 
wire (running vertically). First, the logical wire is grown to site '1.' The 
walk represented by the dotted line on the right sticks at 'a' if the logical 
wire is true and sticks at 'b' if the logical wire is false. Similarly, the walk 
represented by the dotted line on the left sticks at 'c' if the logical wire is 
true and sticks at 'd' if the logical wire is false. Finally, the logical wire 
may continue to grow vertically and the power wire may continue to grow 
to the left without interfering with one another. In this way a single input 
OR is simulated. 

For three and higher dimensional lattices each gate may be separately 
supplied with its own power wire. For example, each gate has a 'column' 
dedicated to its power wire. When the wire reaches the appropriate height, 
it is routed horizontally to the desired gate. For two dimensional DLA there 
is an additional complication in arranging to have the power wire arrive 
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at each gate without interfering with the wires that carry truth values. To 
accomplish this we use a single power wire for the entire circuit. It is 'snaked' 
through the gates level by level. See the power wire in the example shown 
in Fig. ^; this example is dicussed further later. 

The discussion above shows how to pass the power wire through an OR 
gate. It is also necessary to have the power wire cross through a NOR gate. 
This can be done as shown in Fig. ||| In this figure the path of the power 
wire is numbered and the walks that bring particles to the wire are shown 
as dashed lines. Recall that exactly one of the sites 'a,' 'b' or 'c' is occupied 
during the evaluation of the gate. If 'c' is occupied, the power wire grows 
along the full path '1—8.' If instead 'b' is occupied, sites '1' and '2' are 
skipped and growth starts at '3.' In this case the walks that go to '1' and 
'2' turn around there and return to the upper boundary where they are 
absorbed. Finally, if 'a' is occupied then growth of the power wire starts at 
'6.' Thus after helping with the evaluation of a NOR gate, the power wire 
may be passed through. 

A single power wire traverses all the gates in the sequence in which they 
would be evaluated in topological order. This requires that the gates be 
arranged in levels as shown in the example in Fig. ^. The thick lines are 
circuit wires, the filled circles are NOR gates and the open circles are single 
input OR gates. The power wire traverses the gates one level at a time. 
Gates are evaluated from bottom to top and level by level along the path of 
the power wire. The lower edge of the lattice is used as a source for true 
inputs to gates. At each gate the power and input wires arrive first, then the 
gate is evaluated and finally the power wire is continued to the next gate. 
After an entire level has been simulated, outputs are grown to the succeeding 
level. The routing between levels can be accomplished by 'spreading' the 
circuit out on the lattice and then allocating a couple of horizontal channels 
to each output of a gate. An output will be grown upward to its designated 
channel, grown horizontally underneath its appropriate gate and then grown 
upward to serve as an input. The planarity of the original circuit guarantees 
that there will be no interference of walks during this routing. 

The reduction described above shows that the special instance of CVP 
we constructed is faithfully evaluated by the growth of the DLA cluster. 
Furthermore, it is an NC reduction. The key point is that the choice of 
paths for the walks is independent of the evaluation of the circuit. The full 
layout of the walks is given globally by the planar layout of the original 
circuit as outlined above and locally by Figs. ^ through [!| All calculations 
required to compute these walks can be performed in NC. □ 
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3.3 Metropolis dynamics for the Ising model 

Configurations of the Ising model are denned by spin variables, Uj, on a 
lattice where each spin may take the value —1 or +1. The conventional way 
to obtain equilibrium states of the Ising model is via the Metropolis Monte 
Carlo algorithm. At each step of the algorithm a site i is chosen at random 
and the energy change, AEi, for flipping the spin at this site is computed. 
The energy change is given by 

AEi = 2Ja t ]T Gj (1) 

<i,3> 

where the summation is over nearest neighbors of site i and J is the coupling 
energy. If AEi < the spin is 'flipped' (<jj — > — <7j), whereas if AEi > 
the spin is flipped with probability e~ AEi ^ T , where T is the temperature. 
After this procedure has been iterated sufficiently many times, the resulting 
probability distribution for the spin configurations is close to the equilibrium 
state. 

Metropolis dynamics is governed by a random list of sites and, for each 
site in the list, a random number xi with < X{ < 1 such that the site is 
flipped if Xi < e~ AE ^ T . We can define the following natural decision prob- 
lem for Metropolis dynamics. 

Metropolis dynamics (dimension d) 

Given: A positive integer L, an initial configuration of L d spins {a{\ with 
<jj G {— 1, +1}, a temperature variable Q = e~ 4J ^ T where Q is expressed as a 
6-bit binary number, a designated site s expressed in unary with \s\ = L d , a 
list of M sites and a list of M random numbers Xi with < x,- L < 1 expressed 
as a db-bit number. 

Problem: Is a s = +1 after running the Metropolis algorithm? 

Given the random numbers X{ we can assign flip variables, gi € {0, . . . , d}, 
to each site i. For example, in three dimensions the flip variables are defined 
by the inequalities 

9i = if 0<Xi<Q 3 , 
g t = l if Q 3 < Xi < Q 2 , 
g-i = 2 if Q 2 < Xi < Q and 
gi = 3 if Q < Xi < 1. 

If a site k is chosen for a possible flip at step i and A£^/4J < 3 — gi, then 
the flip is carried out; otherwise, the spin is not changed. In other words, a 
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chosen spin i will flip at step j if it has gi or more neighbors of the opposite 
sign. It is clear that the Metropolis decision problem can be NC reduced to 
a version in which the random input is expressed as a list of flip variables; 
it is this variant of the problem that we show is P-complete. 

Theorem 3.3 Metropolis dynamics is P-complete for d greater than or 
equal to 3. 

Proof sketch: The Metropolis problem is proved P-complete by a re- 
duction from monotone CVP. The circuit is first 'embedded' in a three 
dimensional lattice. The AND and OR gates are represented by sites and 
wires connecting gates by chains of sites and bonds. For a circuit having 
./V edges, it can be shown that such an embedding may be carried out in 
NC. Initially, all spins on the lattice are —1. Logical values are represented 
by spin values with +1 (—1) meaning true (false). Logical values are 
propagated along wires by the following device: spins along the wire are 
sequentially chosen for flipping and assigned the flip variable 1. If the pre- 
decessor spin along the wire is +1, the current spin will flip to +1 but if 
the predecessor spin along the wire is —1, the flip is rejected. Thus, once 
initiated, logical values propagate along wires. Wires must always be sepa- 
rated by one or more lattice spacings except where they meet at gates. Sites 
representing gates have two input wires and two output wires. After all sites 
along the input wires have taken their logical values, the gate is ready for 
evaluation. Gates are assigned the flip variable 2 (1) for an and (or) gate. 
Thus, if at least one input is true an OR gate registers true, while both 
inputs must be TRUE for an AND gate to register true. This NC reduction 
shows that we can simulate an arbitrary monotone circuit using Metropolis 
dynamics in three or more dimensions. Therefore, the Metropolis dynamics 
problem is P-complete. □ 

Note that the planar monotone circuit value problem is in NC, see [[[J] 
for a list of references regarding this problem. So, our proof does not show 
that the two-dimensional Metropolis problem is P-complete. We have been 
unsuccessful in our attempts to implement a not gate within the framework 
of Metropolis dynamics. 



The construction in Theorem p.3| relies on a special ordering of the sites 
chosen for flipping. However, we can easily extend the proof to updating 
schemes in which sweeps through the lattice are performed in a fixed or- 
der. For example, consider the case of parallel updating where first the odd 
sublattice is flipped all at once and then the even sublattice. The problem 
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statement is slightly different here since now at each time step flip variables 
are assigned to half the sites in the lattice. It is easy to keep sites inactive 
by assigning them flip variables 3. Sites are assigned flip variables 1 or 2 as 
in the above construction at the times they are to be evaluated. 



3.4 Cluster dynamics for the Ising model 



Cluster flipping algorithms due to Wolff j28| and Swendsen and Wang 1 2£ ] are 
very efficient methods for generating equilibrium states of the Ising model 
near criticality. In this section we show that natural decision problems 
associated with the Wolff and Swendsen- Wang algorithms are P-complete. 

We illustrate the Wolff algorithm on an L x L square lattice. The starting 
point is a configuration of spins, {crj}. Next the bonds of the lattice are 
independently occupied with probability p as in bond percolation. The 
occupation parameter is related to the temperature, T, according to p = 1 — 
Q with Q = e~' 1J l T and J the coupling energy between neighboring spins. 
A site u on the lattice is chosen at random and a cluster is grown from this 
site. A site v is in the cluster grown from u if there is a path from u to v 
such that all the bonds along the path are occupied and all the spins along 
the path including a v are equal to a u . The cluster of spins defined in this 
way is 'flipped' (a — > —a for each a in the cluster) which yields a new spin 
configuration. The procedure is iterated M times. If the temperature T is 
chosen to be the critical temperature and if M is sufficiently large the final 
configuration of spins is close to the equilibrium Ising critical point. At the 
Ising critical temperature, the clusters defined by the algorithm are critical 



droplets |30], 31, 32] with Hausdorf dimension, Dh equal to 15/8. 

The Swendsen- Wang algorithm is very similar to the Wolff algorithm 
except that in each step of the algorithm all connected clusters defined by the 
occupied bonds are identified. All sites of each cluster are assigned the same 
spin value. The spin values for each cluster are determined independently 
by a fair coin toss. 

For each iteration of the Wolff or Swendsen- Wang algorithm, every bond 
of the lattice is occupied with probability p equal to 1— Q. To implement this 
we utilize random numbers Xy with < Xij < 1 for each nearest neighbor 
pair (ij). The bond (ij) is occupied if X{j is greater than Q. At each time 
step a cluster is grown from the starting point according to the occupation 
variables and the current spin configuration as described above. This cluster 
is flipped and the procedure repeated M times. We can define the following 
natural problem based on Wolff dynamics. 
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Wolff dynamics (dimension d) 

Given: A positive integer L, an initial configuration of L d spins with 
Oi € {— 1, +1}, a temperature variable Q = e~ 2J / T where Q is expressed as 
a 6-bit binary number, a designated site s expressed in unary with \s\ = L d , 
a list of M sites and dML d random numbers x^ with < x%j < 1 expressed 
as a 6-bit number. 

Problem: Is a s = +1 after running the Wolff algorithm? 

Given the random numbers Xij we can assign bond occupation variables 
bij such that bij = if Xij < Q and bij = 1 otherwise. Bonds are counted as 
occupied if bij = 1. It is clear that the Wolff decision problem can be NC 
reduced to a version in which the random input is given as the by instead 
of the Xij. It is this version that we show is P-complete using a reduction 
from the planar NOR circuit value problem. 

Theorem 3.4 Wolff dynamics is P-complete. 

Proof idea: The reduction is best illustrated by a simple example. We 
sketch it for the case d equals two. Consider the planar circuit shown in 
Fig. ^ with three inputs and three NOR gates. The evaluation of this circuit 
can be reduced to the Wolff problem shown in Fig. ||. The lower case letters 
indicate occupied bonds and time steps that are arranged in alphabetical 
order. All bonds labeled 'a' are occupied only during step 1, all bonds labeled 
'b' are occupied only during step 2 and so on. Bonds that are not explicitly 
shown are never occupied. All spins on the lattice originally have the value 
+1 except those that are labeled false. A +1 spin represents true and vice 
versa. Numbers label initiation sites for cluster growth. Cluster growth is 
initiated at gates and logical constants. Site T' initiates the cluster growth 
at time step 1 and represents the true input to the circuit; site '2' initiates 
growth at time step '2' and so on. The first cluster propagates as far as 
site '4' and both sites T' and '4' (and the intermediate site) are flipped to 
— 1. Site '2' initiates the next cluster which does not propagate. Site '2' is 
flipped to +1. The cluster initiated at '3' does not propagate so that at the 
beginning of time step 4 site '4' is in the false state; the first gate has been 
properly evaluated. At time step 4 site '4' flips but nothing else happens; the 
second gate properly evaluates to true. At time step 5 a cluster propagates 
from site '5' to the output gate that is flipped to —1. The output of the 
circuit is false as it should be. 
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More generally, a NOR gate is represented by spins, and wires connecting 
gates are represented by paths of bonds and spins. All of the bonds in a 
wire are occupied at the time step during which the wire transmits its logical 
value. If the logical value is true, a cluster of up spins is propagated along 
the wire and the output end of the wire is flipped to false if this has not yet 
occurred. A gate transmits its value by initiation of a cluster and the output 
of a gate can be read off as soon as all of its predecessors have transmitted 
their values. Note that a fan-out higher than two may easily be supported. 
Sites representing gates and logical constants must not be nearest neighbors. 
It is clear that the reduction of the circuit to Wolff dynamics can be carried 
out locally and is an NC reduction. Since planar NOR CVP is P-complete, 
so is the Wolff dynamics problem. □ 

Next we turn our attention to a natural decision problem associated 
with the Swendsen-Wang algorithm. The problem statement requires ran- 
dom 'bits,' Cj equal to ±1, to be used to determine the spins in the clusters. 
Sites are given a conventional ordering. Connected clusters defined by Q 
and the variables are labeled by the lowest ordered site, I, in the cluster 
and all the spins in the cluster are assigned the value q. 

Swendsen-Wang dynamics (dimension d) 

Given: A positive integer L, an initial configuration of L d spins {<7j} with 
(Ji G {— 1, +1}, a temperature variable Q = e~ 2J l T where Q is expressed as a 
6-bit binary number, a designated site s expressed in unary with |s| = L d , a 
number of iterations M, a list of dML d random numbers x\j with < Xij < 1 
expressed as a 6-bit number and a list of ML d random bits, q. 
Problem: Is a s = +1 after running the Swendsen-Wang algorithm? 

Theorem 3.5 Swendsen-Wang dynamics is P-complete. 

Proof hint: The proof is similar to that for the Wolff problem and consists 
of a reduction from planar NOR CVP. Here again a value of d equal to two 
suffices for the reduction. Consider a two-dimensional lattice and suppose 
that the conventional ordering of sites on the lattice is from left to right and 
then from bottom to top. The occupation variables bij are chosen the same 
as for the reduction to the Wolff problem. The cluster spin variables, q, are 
— 1 for gates and true inputs at the time they transmit their values and +1 
for all other sites and all other times. □ 

For both Wolff and Swendsen-Wang dynamics, a single iteration of the 
algorithm can be accomplished in poly-logarithmic parallel time using a 
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polynomial number of processors. This is because the most complex step is 
the identification of a connected component(s), which can be carried out by 
a standard NC algorithm |33]]. More specifically, if the number of iterations 
M is set to a constant in the statement of either the Wolff problem or the 
Swendsen-Wang problem, the resulting decision problem is in NC. For the 
Metropolis algorithm, an even stronger result holds. If M equals a constant, 
the Metropolis decision problem is in AC . These conclusions are not in 
conflict with the P-completeness proofs that rely on setting M comparable 
to the size of the Boolean circuit being simulated. The P-completeness 
results show that a polynomial (in the system size) number of iterations 
of these algorithms cannot be compressed into poly-logarithmic number of 
parallel steps, unless NC = P. 

4 Discussion 

4.1 Summary of results 

We have studied the computational complexity of natural decision problems 
associated with several models in statistical physics. Our results can be 
summarized as follows: 



1. Mandelbrot percolation is in (non-uniform) AC (Theorem 3.1). 

2. Diffusion limited aggregation is P-complete (Theorem |3.2[). 



3. Metropolis dynamics for the Ising model is P-complete (Theorem |3.3|) . 

4. Wolff dynamics for the Ising model is P-complete (Theorem 3.4). 



5. Swendsen-Wang dynamics for the Ising model is P-complete (Theo- 
rem 3.5). 



4.2 Scope of the results 

It is important to understand the limitations of the P-completeness results 
for DLA and the variants of Ising dynamics. Suppose we accept the hy- 
pothesis from complexity theory that NC ^ P. In this case the particular 
dynamics discussed here for generating DLA clusters or equilibrium Ising 
configurations are inherently sequential and cannot be efficiently simulated 
in parallel. There are other ways to generate (approximate) equilibrium 
states of the Ising model or DLA clusters; our results do not imply that 
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these ways are associated with P-complete problems. However, it was pre- 
viously shown U that a second method of producing DLA clusters is also 
P-complete. It seems plausible that there are no poly-logarithmic time 
methods for sampling the DLA distribution. On the other hand, the jury 
remains out on whether it is possible to sample from an approximation to 
the equilibrium critical distribution for spin models in poly-logarithmic time. 
It would be of great interest to obtain results on the difficulty of sampling 
physically interesting distributions. 

A second limitation of the P-completeness statements is that they are 
worst case rather than average case results. For example, assuming that 
NC ^ P, we know that there exist instances of the DLA problem that cannot 
be solved in poly-logarithmic time although we do not know whether these 
'hard' instances are typical or very rare. Indeed, the instances used in the 
P-completeness proofs are atypical. If the 'hard' instances are sufficiently 
rare, we may be able to sample the distribution in poly-logarithmic time on 
average. The theory of average case complexity |}4j |35|] addresses questions 
of this kind. Unfortunately, it is not easy to see how to apply this theory to 
the present problems. 

4.3 Parallel complexity and critical slowing down 

Away from a critical point, the equilibration time of real systems with- 
out macroscopic inhomogeneities is independent of system size. Similarly, 
the Metropolis algorithm can generate good approximations to equilibrium 
configurations of the Ising model away from the critical point in constant 
parallel time since each sweep can be done in constant time and the number 
of sweeps is independent of the system size. The associated decision problem 
is in AC . 

Equilibration of many real systems becomes increasingly slow near crit- 
ical points. Typically the equilibration time at a critical point scales as L z , 
where L is the system size and z is the dynamic exponent. This phenomena, 
known as critical slowing down, also afflicts most Monte Carlo methods used 
to sample spin configurations at critical points. The dynamic exponent z is 
customarily defined for Monte Carlo dynamics if relaxation to equilibrium 
requires flipping L z+d spins. It is often said that an algorithm suffers no 
critical slowing down if z = (with possible logarithmic corrections). This 
is not a satisfactory general definition of 'absence of critical slowing down.' 
For example, imagine an algorithm for which each spin is flipped only once 
but an enormous computation is required to decide whether or not to effect 
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the flip. Alternatively, one might propose that 'absence of critical slowing 
down' means that the sequential time (computational work) is o(L d+e ) for 
any e > 0. This definition is both machine dependent and unnecessarily 
stringent. 

We propose that 'absence of critical slowing down' should be identified 
with the class NC. A sampling method suffers no critical slowing down 
if it can be run in poly-logarithmic time on a P-RAM with polynomially 
many processors. This definition is, for the most part, in agreement with 
the z = definition. If z > for the Monte Carlo methods studied here, the 
P-completeness results show that there is critical slowing down according 
to the new definition. On the other hand, if z = for either the Metropolis 
or Swendsen-Wang algorithms, there is no critical slowing down since a 
single sweep through the lattice for either of these algorithms can be done 
efficiently in parallel and z = implies a poly-logarithmic number of sweeps. 
In contrast, the Wolff algorithm suffers critical slowing down even for z = 
according to the new definition. The reason is that the average size of 
Wolff clusters scales as L" 1 ^, where 7 is the susceptibility exponent and v 
the correlation length exponent. Thus, even if z = one typically requires 
£d-7/i/ iterations of the algorithm to reach equilibrium. The P-completeness 
result shows that carrying out these iterations can almost certainly not be 
done in poly-logarithmic time using a polynomial number of processors. 

4.4 Final remarks 

In this and two previous papers || |6| we have investigated the parallel 
computational complexity of a variety of models in statistical physics. We 
have claimed that parallel complexity provides statistical physics with a 
robust and sharply defined measure that reflects some of our more intuitive 
notions of complexity. We have classified a wide variety of models into 
three broad classes: those that require constant parallel time to simulate, 
those that require poly-logarithmic time and those that require polynomial 
time. In each case we allow a polynomial number of processors. Even among 
models that generate random fractal patterns, we find representatives in each 
of these classes. Comparisons between members of different classes reveal 
that models in the higher classes generally pose a more difficult theoretical 
challenge. It would be extremely interesting to find more precise correlations 
between computational complexity and the quantities conventionally studied 
in statistical physics. 
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List of Figures 



1. A realization of Mandelbrot percolation. 

2. A realization of diffusion limited aggregation. 

3. Gadget for a NOR gate. Filled circles and connecting bold lines show 
the two input wires, the output wire and the power wire. The dotted 
line shows the path of the walk that evaluates the gate. This walk 
terminates on site 'a,' 'b' or 'a' 

4. Gadget for a single input OR gate (effectively crossing a power wire and 
a logical wire). The logical wire first arrives at '1,' then the two dotted 
walks carry the power wire across the junction, sticking at 'a' and 'c' 
respectively if the logical wire is true or at 'b' and £ d' respectively if 
the logical wire is false. 

5. Passing the power wire through a NOR gate after its evaluation. The 
growth of the wire follows the numbered sites where the first occu- 
pied site is either '1,' '3' or '6' depending on whether 'c,' 'b' or 'a' is 
occupied. 

6. Layout of NOR (solid circles) and single input OR gates (open circles) 
in levels with the power wire (thin line) traversing the gates in the 
order in which they are evaluated. 

7. A simple planar NOR circuit with three inputs and three gates used to 



illustrate Theorem 3.4. 



8. The Wolff problem that simulates the NOR circuit. Numbered sites 
represent gates and logical constants. Bonds with the same letter are 
occupied during the same time step, bonds 'a' during step 1, bonds 'b' 
during step 2 and so on. 
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